X/Twitter Intelligence Scraper Pro
Pricing
from $1.00 / 1,000 results
X/Twitter Intelligence Scraper Pro
Scrape public X/Twitter posts and profiles from search terms, handles, tweet URLs, and lists. Export clean tweet, author, media, engagement, and monitoring data for research, marketing, and social listening.
Pricing
from $1.00 / 1,000 results
Rating
0.0
(0)
Developer
Muhammad Qaseem Iqbal
Maintained by CommunityActor stats
0
Bookmarked
2
Total users
1
Monthly active users
3 days ago
Last modified
Categories
Share
X/Twitter Intelligence Scraper Pro π
Collect public X/Twitter data for monitoring, research, reporting, dashboards, spreadsheets, and AI workflows.
This actor can scrape public search results, profile timelines, tweet URLs, lists, conversations, and recurring monitoring runs. It is designed to start cheaply by default, then use a browser only when X/Twitter blocks simple page access or does not return enough data.
β οΈ Important: X/Twitter often changes how public pages load. Cheap HTTP-only runs may return warning records instead of tweets when X serves a generic page. For more reliable results, use
http_firstwithbrowserFallbackOnEmpty: true.
TL;DR β‘
Want to scrape tweets from a profile and get real results reliably? Start here:
{"scrapeMode": "profiles","twitterHandles": ["NASA"],"maxItems": 10,"maxItemsPerProfile": 10,"crawlStrategy": "http_first","browserFallbackOnEmpty": true,"maxConcurrency": 1,"downloadMedia": false}
Want the cheapest possible test run first?
{"scrapeMode": "search","searchTerms": ["from:NASA lang:en"],"maxItems": 10,"crawlStrategy": "http_only","downloadMedia": false}
If the cheapest run returns X_ACCESS_WARNING or NO_TWEETS_FOUND, switch to the first example with browser fallback enabled.
What You Can Collect π₯
| Data source | What to enter | Example |
|---|---|---|
| Search results | Search terms or advanced X/Twitter queries | from:NASA lang:en |
| Profile timelines | One or more handles | NASA, OpenAI, apify |
| Tweet details | Tweet URLs or tweet IDs | https://x.com/NASA/status/... |
| Start URLs | Profile, tweet, list, or search URLs | https://x.com/NASA |
| Lists | Public X/Twitter list URLs | https://x.com/i/lists/... |
| Conversations | Tweet IDs or tweet URLs | Use when you want replies/context |
| Monitoring | Repeated search/profile runs | Emit only newly seen tweets |
| User discovery | Author aggregation from matching tweets | Useful for audience research |
Common Use Cases π―
- Track mentions of a brand, product, event, or person.
- Collect posts from public profiles for research or reporting.
- Monitor public conversations around keywords or hashtags.
- Build datasets for dashboards, spreadsheets, or BI tools.
- Prepare clean tweet text and metadata for AI search, chatbots, and RAG workflows. RAG means retrieval-augmented generation, a common way to give AI apps source data to search.
- Watch for new posts over time with persistent monitoring.
- Export structured tweet records to Apify datasets.
How It Works π οΈ
- Choose a scrape mode, such as search, profiles, URLs, tweet details, conversations, or monitoring.
- Add search terms, handles, tweet IDs, or X/Twitter URLs.
- Set
maxItemsto control how many records you want. - Choose a crawl strategy:
http_only: lowest cost, fastest, but may return warnings if X blocks public HTML.http_first: tries the cheap method first, then can use a browser if enabled.browser_only: highest extraction effort, usually higher cost.
- Run the actor and download your results from the dataset.
Recommended Settings π‘
| Goal | Recommended settings |
|---|---|
| Cheapest test | crawlStrategy: "http_only", maxItems: 10, downloadMedia: false |
| More reliable profile scraping | crawlStrategy: "http_first", browserFallbackOnEmpty: true |
| Keep costs low | Use maxConcurrency: 1, avoid media downloads, start with small maxItems |
| Get more search results | Try sort: "top" or sort: "latest_and_top" |
| Avoid duplicate tweets | Keep deduplicateBy: "tweetId" |
| AI-ready output | Enable includeRagFields: true to add fields that are easier for AI apps to search |
| Spreadsheet-friendly output | Enable flattenOutput: true or choose outputFields |
Example Inputs π§ͺ
Scrape a Profile π€
{"scrapeMode": "profiles","twitterHandles": ["NASA"],"maxItems": 25,"maxItemsPerProfile": 25,"crawlStrategy": "http_first","browserFallbackOnEmpty": true}
Search for Tweets π
{"scrapeMode": "search","searchTerms": ["\"artificial intelligence\" lang:en -filter:retweets"],"maxItems": 100,"sort": "latest_and_top","crawlStrategy": "http_first","browserFallbackOnEmpty": true}
Use a Date Range π
{"scrapeMode": "search","searchTerms": ["from:NASA since:2026-01-01 until:2026-06-01"],"maxItems": 200,"sort": "top","dateSplitStrategy": "monthly"}
Mix Handles, URLs, and Search Terms π§©
{"scrapeMode": "auto","searchTerms": ["@apify lang:en"],"twitterHandles": ["NASA"],"startUrls": [{ "url": "https://x.com/OpenAI" }],"tweetIds": ["1728108619189874825"],"maxItems": 100}
Monitor New Results Over Time π
{"scrapeMode": "monitoring","searchTerms": ["\"launch announcement\" lang:en"],"maxItems": 50,"monitoring": {"enabled": true,"stateKey": "launch-monitor","emitOnlyNewItems": true,"lookbackHours": 24},"deduplicateScope": "persistent"}
Prepare Data for AI Search or Chatbots π€
{"scrapeMode": "profiles","twitterHandles": ["NASA"],"maxItems": 50,"includeRagFields": true,"cleanText": true,"includeRawData": false}
Main Input Options βοΈ
| Field | Plain-English meaning | Default |
|---|---|---|
scrapeMode | What kind of run to perform. Use auto if you are mixing inputs. | auto |
searchTerms | Search queries to run on X/Twitter. Supports advanced search syntax. | [] |
twitterHandles | Public profile handles to scrape. @ is optional. | [] |
startUrls | Public X/Twitter URLs, including profiles, tweets, lists, and searches. | [] |
tweetIds | Tweet IDs to fetch directly. | [] |
maxItems | Maximum number of output records for the run. | 100 |
maxItemsPerQuery | Maximum records per search query, URL, or list. | 100 |
maxItemsPerProfile | Maximum records per profile timeline. | 100 |
sort | Search order: latest, top, both, or automatic fallback. | latest |
profileMode | Which profile tab to use, such as tweets, replies, or media. | tweets |
filters | Optional filters for language, engagement, media, links, replies, and retweets. | {} |
crawlStrategy | How the actor loads pages: cheap HTTP, HTTP first, or browser only. | http_only |
browserFallbackOnEmpty | Use a browser if the cheap request returns no tweets or an access warning. | false |
downloadMedia | Download images/videos to storage instead of only collecting metadata. | false |
flattenOutput | Make nested fields easier to use in CSV/spreadsheets. | false |
outputFields | Keep only selected fields in the final output. | [] |
includeRagFields | Add AI-friendly text chunks and metadata for search/chatbot workflows. | false |
monitoring | Save state between runs and emit only new items. | disabled |
Output π¦
Results are saved to the Apify dataset. Most successful records are tweet records.
Example tweet record:
{"recordType": "tweet","tweetId": "2064422103416238295","url": "https://x.com/NASA/status/2064422103416238295","text": "Pinned NASA @NASA Jun 9 Introducing Artemis III...","cleanText": "Pinned NASA @NASA Jun 9 Introducing Artemis III...","author": {"userName": "NASA","url": "https://x.com/NASA"},"isReply": false,"isRetweet": false,"isQuote": false,"discovery": {"inputType": "twitterHandles","handle": "NASA"},"scrapedAt": "2026-06-14T15:40:03.212Z"}
Depending on the page and settings, records may include:
- Tweet ID and URL
- Tweet text and cleaned text
- Author handle, name, URL, and profile details when available
- Creation time and language when available
- Reply, repost, quote, like, bookmark, and view counts when available
- Media metadata when available
- Discovery metadata showing which input produced the record
- Optional AI/RAG fields for search and chatbot workflows
- Optional raw data if
includeRawDatais enabled
Warning and Error Records β οΈ
When X/Twitter does not return usable public tweet data, the actor writes a clear warning record instead of silently failing.
| Code | What it means | What to try |
|---|---|---|
X_ACCESS_WARNING | X returned a generic or restricted page. | Use http_first with browserFallbackOnEmpty: true. |
NO_TWEETS_FOUND | The page loaded, but no public tweets were found. | Try a broader query, sort: "top", or browser fallback. |
HTTP_FETCH_WARNING | The cheap HTTP request failed. | Retry with browser fallback or Apify Proxy. |
REQUEST_FAILED | A browser request failed after retries. | Lower concurrency, raise timeout, or try a smaller run. |
Tips for Better Results β
- Test with
maxItems: 10before running a larger job. - If a search query returns few results, try
sort: "top"orsort: "latest_and_top". - If your query uses
until, try removing it or using smaller date windows. - Use
dateSplitStrategyfor long historical searches. - Keep
maxConcurrencylow for X/Twitter pages. - Enable browser fallback when HTTP-only runs return warning records.
- Use
downloadMedia: falseunless you really need downloaded files. - Use
outputFieldsif you only need a few columns.
Cost Notes πΈ
This actor is built with cost control in mind:
- It starts with
http_only, the cheapest crawl strategy. - It uses one concurrent request by default.
- It does not retry failed requests by default.
- It does not download media by default.
- It caps output with
maxItems.
Browser fallback is more reliable, but it costs more because it launches a real browser. Use it when you need results and HTTP-only mode returns access warnings.
Limitations π§
- This actor only collects public data.
- It does not access protected/private accounts.
- It does not require or store X/Twitter login credentials.
- X/Twitter may hide, rate-limit, or change public pages at any time.
- Some fields are best-effort because X may not expose them on every page.
- Media downloading can increase runtime and storage usage.
- Conversation and search behavior depends on what X/Twitter publicly serves at run time.
Troubleshooting π§
I got X_ACCESS_WARNING β οΈ
X likely returned a generic page instead of tweet data. Switch to:
{"crawlStrategy": "http_first","browserFallbackOnEmpty": true}
I got NO_TWEETS_FOUND π
Try a less restrictive query, a public profile with recent posts, sort: "top", or browser fallback.
I got fewer results than expected π
Check maxItems, maxItemsPerQuery, and maxItemsPerProfile. Also remember that X/Twitter may show different results depending on search mode, date range, location, and whether the page is loaded in a browser.
My run is too expensive πΈ
Lower maxItems, keep downloadMedia disabled, use maxConcurrency: 1, and start with http_only. Use browser fallback only when needed.
I see duplicate tweets β»οΈ
Keep deduplicateBy: "tweetId". If you use latest_and_top, the same tweet can appear in both searches, so deduplication is recommended.
For Developers π§βπ»
Run locally:
npm installnpm testnpm run buildnpm run dev
Deploy to Apify:
$apify push
Support π
If results look wrong, include the run ID, input JSON, and a short description of what you expected. The run summary and dataset warning records usually show whether the issue came from input settings, public X/Twitter access limits, or page changes.